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Microassay for serial analysis of gene expression and applications thereof 



(57) Method ot obtaining a library of tags able to de- 
fine a specific state of a biological sample, comprising 
the following successive steps: (1 ) extracting in a single- 
step mRNA from a small amount of a biological sample 
using oligo(dT) 25 covalently bound to paramagnetic 
beads, (2) generating a double strand cDNA library, 
from said mRNA, (3) cleaving the obtained cDN As using 
Sau3A I, (4) separating the cleaved cDNAs in two aliq- 
uots, (5) ligating the cDN A contained in each of said two 
aliquots via said Sau3A I restriction site to a linker con- 
sisting of one double-strand cDNA molecule having one 
of the following formulas GATCGTCCC-Xt or 
GATCGTCCC-X 2 , wherein X, and X 2 , which comprise 
30-37 nucleotides and are different, include a 20-25 bp 
PCR priming site with a Tm of 55°C-65°C, (6) digesting 



the products obtained in step (5) with the tagging en- 
zyme BsmF I, (7) blunt-ending said BsmF I tags with a 
DNA polymerase and mixing the tags ligated with the 
different linkers, (8) ligating the tags obtained in step (7) 
to form ditags with a DNA ligase, (9) amplifying the dit- 
ags obtained in step (8) with primers comprising 20-25 
bp and having a Tm of 55 0 -65°C, (10) isolating the dit- 
ags having between 20 and 28 bp from the amplification 
products obtained in step (9) by digesting said amplifi- 
cation products with Sau3A I and separating the digest- 
ed products, (11) ligating the ditags obtained in step (10) 
to form concatemers, purifiying said concatemers and 
separating the concatemers having more than 300 bp, 
(1 2) cloning and sequencing said concatemers and (1 3) 
analysing the different obtained tags. 
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Description 



[0001] Several methods are now available for monitoring gene expression on a genomic scale. These include DNA 
microarrays (1, 2) and macroarrays (3, 4), expressed sequence tag (EST) determination (5, 6), and serial analysis of 

5 gene expression (7) Such methods have been designed, and are still used, for analysing macroamounts of biological 
material (1-5 jag of poly(A) mRNAs, i.e. -10? cells). However, mammalian tissues consist of several different cell types 
with specific physiological functions and gene expression patterns. Obviously, this makes intricate the interpretation 
of large scale expression data in higher organisms. It is therefore most desirable to set out methods suitable for the 
analysis of defined cell populations. 

io [0002] SAGE has been shown to provide rapid and detailed information on transcript abundance and diversity (7-10). 
It involves several steps for mRNA purification, cDNA tags generation and isolation, and PCR amplification. We rea- 
soned that increasing the yield of the various extraction procedures, together with slight modifications in the number 
of PCR cycles could enlarge SAGE potentiality. Here we present a microadaptation of SAGE, referred to as SADE (11 ) 
since, in contrast to the original method, it allows to provide quantitative gene expression data on a small number 

is (30,000-50,000) of cells. 

[0003] SAGE was first described by Velculescu et al. in 1995 (US Patent 5 695 937 and 7), and rests on 3 principles 
which have now been all corroborated experimentally: a) short nucleotide sequence tags (10 bp) are long enough to 
be specific of a transcript, especially if they are isolated from a defined portion of each transcript; b) concatenation of 
several tags within a single DNA molecule greatly increases the throughput of data acquisition; c) the quantitative 

20 recovery of transcript-specific tags allows to establish representative gene expression profiles. 

[0004] However, said method was designed to study macroamounts of biological materials (5 u.g of poly(A) RNAs, 
i e about 1 0 7 cells) Since mammalian tissues consist of several different cell types with specific physiological functions 
and gene expression patterns, it is most desirable to scale down the SAGE approach for studying well delineated tissue 
fragments or isolated cell populations. 

25 [0005] The inventors have now found a new method able to handle microamounts of samples. 

[0006] The subject of the present invention is a method of obtaining a library of tags able to define a specific state 
of a biological sample, such as atissue or a cell culture, characterised in that it comprises the following successive steps: 

(1 ) extracting in a single-step mRNA from a small amount of a biological sample using oligo(dT) 25 covalently bound 
30 to paramagnetic beads, 

(2) generating a double-strand cDNA library, from said mRNA according to the following steps: 

* synthesising the 1 st strand of said cDNA by reverse transcription of said mRNA template into a 1 st comple- 
mentary single-strand cDNA, using a reverse transcriptase lacking Rnase H activity, 
35 * synthesising the 2 nd strand of said cDNA by nick translation of the mRNA, in the mRNA-cDNA hybrid form by 

an E. co// DNA polymerase, 

(3) cleaving the obtained cDNAs using the restriction endonuclease Sau3A I as anchoring enzyme, 

(4) separating the cleaved cDNAs in two aliquots, 

40 (5) ligating the cDNA contained in each of said two aliquots via said Sau3A I restriction site to a linker consisting 

of one double-strand cDNA molecule having one of the following formulas: 



GATCGTCCC-X, or GATCGTCCC-X 2 , 

45 

wherein X, and X 2 , which comprise 30-37 nucleotides and are different, include a 20-25 bp PCR priming site 
with a Tm of 55*C-65°C, and 

wherein GATCGTCCC correspond to a Sau3A I restriction site joined to a BsmF I restriction site, 

so (6) digesting the products obtained in step (5) with the tagging enzyme BsmF I and releasing linkers with anchored 

short piece of cDNA corresponding to a transcript-specific tag, said digestion generating BsmF I tags specific of 
the initial mRNA, 

(7) blunt-ending said BsmF I tags with a DNA polymerase, preferably T7 DNA polymerase or Vent polymerase 
and mixing the tags ligated with the different linkers, 
55 (8) ligating the tags obtained in step (7) to form ditags with a DNA ligase, 

(9) amplifying the ditags obtained in step (8) with primers comprising 20-25 bp and having a Tm of 55°-65°C, 

(10) isolating the ditags having between 20 and 28 bp from the amplification products obtained in step (9) by 
digesting said amplification products with the anchoring enzyme Sau3A I and separating the digested products on 
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an appropriate gel electrophoresis, 

(11 ) ligating the ditags obtained in step (10) to form concatemers, purifiying said concatemers and separating the 
concatemers having more than 300 bp, 

(12) cloning and sequencing said concatemers and 
5 (13) analysing the different obtained tags. 

[0007] According to an advantageous embodiment of said method, in step (2), said synthesis of the 1 st strand of 
said cDNA is performed with Moloney Murine Leukaemia Virus reverse transcriptase (M-MLV RT), and oligo(dT) 25 as 
primers. 

10 [0008] According to another advantageous embodiment of said method, the linkers of step (5) are preferably hybrid 
DNA molecules formed from linkers 1 A and 1 B or from linkers 2A and 2B, having the following formulas: 

linker 1A: 5' -ttttgccaggtcactcaagtcggtcattcatgtcagcacagggac-3' 

15 

linker IB: 5' -gatcgtccctgtgctgacatgaatgaccgacttgagtgacctggca-3' 



or 



linker 2A: 5' -tttttgctcaggctcaaggctcgtctaatcacagtcggaagggac-3' 



linker 2B: 5' -gatcgtcccttccgactgtgattagacgagccttgagcctgagcaa-3' . 



[0009] According to another advantageous embodiment of said method, the amount of each linker in step (5) is at 
most of 8-10 pmol and preferably comprised between 0.5 pmol and B pmol for initial amounts of respectively 10-40 ng 
of mRNAs and 5 u.g of mRNAs. 
30 [0010] According to yet another advantageous embodiment of said method, the primers of step (9) have preferably 
the following formulas: 



5'-GCCAGGTCACTCAAGTCGGTCATT-3' 



5 '-TGCTCAGGCTC AAGGCTCGTCTA-3 4 



[0011] According to yet another advantageous embodiment of said method, the biological sample of step (1) pref- 
40 erably, comprises £ 5.10 6 cells, corresponding to at most 50 u,g of total RNA or 1 u.g of poly(A) RNA. 

[001 2] According to the invention, biological sample means for instance : tissue, cells (native or cultured cells), which 
are lysed for extracting mRNA. 

[0013] According to another advantageous embodiment of said method, said tissue sample is from kidney, more 
specifically from nephron segments corresponding to about 15,000 to 45,000 cells, corresponding to 0.15-0.45 u.g of 
45 total RNA. 

[0014] The subject of the present invention is also the use of a library of tags obtained according to the method as 
defined above, for assessing the state of a biological sample, such as a tissue or a cell culture. 
[001 5] The subject of the present is also the use of the tags obtained according to the method as defined here above 
as probes. 

so [0016] The subject of the present invention is also a method of determination of a gene expression profile, charac- 
terised in that it comprises : 

performing steps (1 ) to (1 3) according to claim 1 and 
translating cDNAtag abundance in gene expression profile. 



[0017] According to a preferred embodiment of said method the gene expression profile obtained in mouse outer 
medullary collecting duct (OMCD) and in mouse medullary thick ascending limb (MTAL) is as specified in Table I below: 



3 



EP001 0242 01 ffile://C:\Do cum ents an d Settings\pmillner\Desktop\Karlgatente^^ 1024201 .cpcL 



Page 4 of 3C 



EP 1 024 201 A1 



10 



15 



20 



25 



30 



35 



40 



45 



50 



OMCD 


MTAL 


Tag 


GenBank match 


99 


2 


GTGGCAGTGG 


EST (AA097074) similar to rat AQP-2 (D1 3906) 


34 


1 


TTATAATTTG 


ESTs 


27 


0 


TGGCAGTGGG 


No match 


19 


5 


TGACTCCCTC 


B2 repetitive sequence 


13 


0 


AAGTTTAAAT 


Thymosin beta-4 (X16053) 


13 


1 


AGCAAGCAGG 


(3-actin (X03672) 


13 


4 


CAAAAAGCTA 


ESTs, similar to rat ribosomal protein L11 (X62146) 


11 


1 


ACATTCCTTA 


ESTs 


11 


14 


ACCGACCGCA 


Integral membrane protein 2B1 (U76253) 


11 


o 


CAGAAGAAGT 


Endogenous murine leukemia virus (M17326) 


10 


5 


AAATAAAGTT 


Lactate deshydrogenase 2, B chain (X51905) 


10 


0 


AGAAGCAGTG 


EST 750555 (AA472938) 


10 


6 


TGATGCCCTC 


B2 repetitive sequence 


9 


4 


AGGCTACTAC 


Ribosomal protein L27a (X05021) 


9 


11 


GCTCATTGGA 


ESTs 


9 


6 


GCTTTCAGCA 


ESTs, similar to human extracellular proteinase inhibitor horn 








(X63187) 


9 


14 


GTGACTGGGT 


CytC oxidase subunit IV (X54691) 


9 


0 


TGACCAAGGC 


11 p-hydroxysteroid dehydrogenase type 2 (X90647) 



[0018] The invention also relates to a kit useful for detection of gene expression profile, characterised in that the 
presence of a cDNA tag, obtained from the mRNA extracted from a biological sample, is indicative of expression of a 
gene having said tag sequence at an appropriate position, i.e. immediately adjacent to the most 3' Sau3A I site in said 
cDNA, obtained form said mRNA, the kit comprising further to usual buffers for cDNA synthesis, restriction enzyme 
digestion, ligation and amplification, 

- containers containing a linker consisting of one double-strand cDNA molecule having one of the following formulas: 

GATCGTCCC-X, or GATCGTCCC-X 2 , 

wherein X-, and X 2 , which comprise 30-37 nucleotides and are different, include a 20-25 bp PCR priming site 
with a Tm of 55°C-65°C, and 

wherein GATCGTCCC correspond to a Sau3A I restriction site joined to a BsmF I restriction site, and 
containers containing primers comprising 20-25 bp and having a Tm of 55°-65°C 
[0019] According to an advantageous embodiment of said kit, it preferably contains 

- containers containing hybrid DNA molecules formed from linkers 1 A and 1 B or from linkers 2A and 2B, having the 
following formulas: 

linker 1A: 5' -ttttgccaggtcactcaagtcggtcattcatgtcagcacagggac-3' 
linker IB: 5' -gatcgtccctgtgctgacatgaatgaccgacttgagtgacctggca-3' 



55 



linker 2A: 5' -tttttgctcaggctcaaggctcgtctaatcacagtcggaagggac-3' 
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linker 2B: 5 ' -gatcgtcccttccgactgtgattagacgagccttgagcctgagcaa-3' . 

and 

containers containing the following primers: 

5 '-GCC AGGTC ACTC AAGTCGGTC ATT-3 1 

S'-TGCTCAGGCTCAAGGCTCGTCTA-S*. 



[0020] As compared to SAGE, the instant SADE method includes the folowing features: 1 ) single-step mRNA puri- 
fication from tissue lysate; 2) use of a reverse transcriptase lacking Rnase H activity; 3) use of a different anchoring 

is enzyme; 4) modification of procedures for blunt-ending cDNA tags; 5) design of new linkers and PCR primers. 

[0021] Figure 1 , modified from the original studies of Velculescu et al., summarises the different steps of the SADE 
method, which is a microadaptation of SAGE. Briefly, as already specified here above, mRNAs are extracted using 
oligo(dT) 25 covalently bound to paramagnetic beads. Double strand cDNA is synthesised from mRNA using oligo(dT) 25 
as primer for the 1 st strand synthesis. The cDNA is then cleaved using a restriction endonuclease (anchoring enzyme: 

20 SAU3A I) with a 4-bp recognition site. Since such an enzyme cleaves DNA molecules every 256 bp (4 4 ) on average, 
virtually all cDNAs are predicted to be cleaved at least once. The 3' end of each cDNA is isolated using the property 
of the paramagnetic beads and divided in half. Each of the two aliquots is ligated via the anchoring enzyme restriction 
site to one of the two linkers containing a type IIS recognition site {tagging enzyme: BsmF I) and a priming site for PCR 
amplification. Type IIS restriction endonucleases display recognition and cleavage sites separated by a defined length 

25 ( 1 4 bp for BsmF I), irrespective of the intercalated sequence. Digestion with the type 1 1 S restriction enzyme thus releases 
linkers with an anchored short piece of cDNA, corresponding to a transcript-specific tag. After blunt ending of tags, the 
two aliquots are linked together and amplified by PCR. Since all targets are of the same length (110 bp) and are 
amplified with the same primers, potential distortions introduced by PCR are greatly reduced. Furthermore, these 
distortions can be evaluated, and the data corrected accordingly (7, 8). Ditags present in the PCR products are recov- 

30 ered through digestion with the anchoring enzyme and gel purification, then concatenated and cloned. 

[0022] In the SAGE method, mRNAs are isolated using conventional methods, then hybridized to biotinylated oligo 
(61) for cDNA synthesis. After cleavage with the anchoring enzyme Nla III, the biotionylated cDNA fraction (3' end) is 
purified by binding to streptavidin beads. 

[0023] In the SADE method, mRNAs are directly isolated from the tissue lysate through hybridization to oligo(dT) 
35 covalently bound to magnetic beads. Then, all steps of the experiment (until step 3 of protocol 5, as described here 
after) are performed on magnetic beads. This procedure saves time for the initial part of the experiment and, more 
importantly, provides better recovery. Quantitative analysis of the cDNA amounts available for library construction 
revealed dramatic differences between SAGE and SADE. With the SAGE method, starting from 500 mg tissue, 1 .7 ug 
of cDNA are obtained, and only 4 ng were able to bind to streptavidin beads after Sau3A I digestion. With the SADE 
40 method, starting from 250 mg of tissue, 3.2 u.g of cDNA were synthesised on beads, and 0.5 |ig remained bound after 
Sau3A I cleavage. The increased yield of SADE (X250) explains the success in constructing libraries from as few as 
30,000 cells. Using different sources of oligo(dT) leaded to poor cDNA recoveries. This may be explained by the fact 
that the binding capacity of streptavidin beads can be altered by several parameters, such as the presence of phenol, 
the length and composition of the biotinylated DNA fragments, and the length of the spacer between the oligo(dT) and 
45 the biotin molecule. 

[0024] Another important difference between SAGE and SADE concerns the selected anchoring enzyme. Although 
any restriction enzyme with a 4-bp restriction site could serve as anchoring enzyme, Sau3A I was preferred to Nla lit 
(7-10) or other enzymes in our studies. Several cDNA libraries used for large scale sequencing are constructed by 
vector priming, followed by cDNA cleavage with Mbo I (an isoschizomer of Sau3A I which does not cut the vector 
so (methylated) DNA), and circularisation (6). SADE tags therefore correspond to the cDNA 5' ends of these libraries, 
which enables to use more efficiently EST data bases to analyse the data. 

[0025] In addition to the preceding arrangements, the invention further comprises other arrangements, which will 
emerge from the description which follows, which refers to examples for carrying out the process which is the subject 
of the present invention as well as to the accompanying drawings, in which: 



Figure 1: Outline of procedures for constructing SADE libraries. Poly(A) RNAs are isolated from tissue lysate 
using oligo(dT) 25 covalently linked to paramagnetic beads, and cDNA is synthesised under solid-phase condition. 
Bold face characters correspond to biologically relevant sequences, whereas light characters represent linker- 
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derived sequences. The anchoring enzyme (AE) is Sau3A I, whereas the tagging enzyme (TE) is BsmF I. See text 
for details. 

■ Figure 2: Gel analysis of cDNAs synthesised from different amounts of tissue. Poly(A) RNAs were isolated from 
the indicated amounts of mouse kidney, and cDNAs were synthesised and Sau3A l-digested on paramagnetic 
5 beads (see protocols 1-2). cDNAs released from beads were recovered, and half of the material obtained from 

each reaction was analysed on a 1% agarose gel stained with ethidium bromide. Position of molecular weight 
markers are indicated in bp: left, X Bste ll-digest; right, pBR Msp l-digest. 

- Figure 3: PCR amplification of ditags. Poly(A) RNAs were isolated from 50 or 150 mm of microdissected nephron 
segments (corresponding to about 15,000 and 45,000 cells, respectively). The corresponding ditags were amplified 

10 by PCR using the indicated number of cycles and analysed on a 3% agarose gel stained with ethidium bromide. 

The expected product (linkers + 1 ditag) is 110-bp long. Molecular weight marker (M) is 10-bp DNA ladder (Life 
Technologies). 

- Figure 4: Gel analysis of concatemers. Ditags were concatenated by ligation (2 h at 16°C), then electrophoresed 
through an 8% polyacrylamide gel. The gel was post-stained using SYBR Green I, and visualised by UV illumination 

15 at 305 nm. Migration of the molecular weight marker (X BstE ll-digest) is indicated on the right. 

- Figure 5: Comparison of gene expression levels in two nephron portions of the mouse kidney. SADE libraries 
were constructed from -50,000 cells isolated by microdissection from medullary collecting ducts or medullary thick 
ascending limbs, and 5,000 tags were sequenced in each case. The data show the 18 most abundant collecting 
duct tags originating from nuclear transcripts (mitochondrial tags were excluded from the analysis), and their cor- 

20 responding abundance in the thick ascending limb library. 

[0026] It should be understood, however, that these examples are given solely by way of illustration of the subject 
of the invention and do not in any way constitute a limitation thereto. 




26 Example: 



1. Tissue sampling and mRNA isolation 



1.1 Tissue sampling and lysis 

30 

[0027] The initial steps of library construction require the usual precautions recommended for experiments carried 
out with RNAs (12). In addition, since library construction involves large scale PCR (Protocol 6) , care must be taken 
to avoid contamination from previous libraries. Working under PCR grade conditions is especially important when low 
amounts of tissue or cells are used. 

35 [0028] Starting from whole tissues (i.e. kidney, liver, brain, ...), the following procedures may be routinely used. After 
animal anaesthesia or decapitation, the tissue is removed as quickly as possible, rapidly rinsed in ice-cold phosphate- 
buffered saline, sliced in -50 mg-pieces, and frozen in liquid nitrogen. The frozen sample is then ground to a fine 
powder under liquid nitrogen using a mortar and a pestle, transferred into lysis binding buffer {protocol 7), and homog- 
enised with a Dounce tissue disrupter. To avoid loss of material, small samples (<20 mg) can be transferred without 

40 previous freezing in the lysis binding buffer, and homogenised in a 1 ml Dounce. The respective amounts of tissue and 
lysis binding buffer needed for a variety of conditions are indicated in Table II. 



Table II. 



Small and large scale mRNA isolation and cDNA synthesis 


Tissue/ cells 


Lysis binding buffer (ml) 


Oligo(dT) beads (u.l) 


Reaction volume (u.l) 


1 st strand 


2nd strand 


250 mg/ 3x1 0? 


5.50 


600 


50 


400 


30mg/3x10 6 -6x10 6 


0.70 


100 


50 


400 


4mg/10 5 -10 6 


0.10 


30 


25 


200 


0.5 mg/SxIOMO 5 


0.05-0.10 


20 


25 


200 



[0029] Starting from isolated or cultured cells, the procedure is much more rapid. The cell suspension, maintained 
55 in appropriate culture or survival medium, just needs to be centrifuged at 600-1, 200g for 5 min. After supernatant 
removal, the lysis binding buffer is added onto the cell pellet, and the sample is homogenised by vortexing. This pro- 
cedure has been successfully applied to SXIO^-SXIO 7 cells (Table II). 
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1.2. mRNA isolation. 

[0030] Protocols 1 -7 describe the generation of a SADE library from 0.5 mg of tissue. The amount of cDN A recovered 
corresponds to an experiment carried out on the mouse kidney. Slightly different amounts are expected to be obtained 

5 from other tissues, according to their mRNA content. The procedures described herein have been repeatedly used 
without modifications with 3x104-105 isolated cells. Since some applications can be performed on large amounts of 
tissue or cells, protocol adaptations and anticipated results for these kinds of experiments are also provided. 
[0031] In the initial experiments, RNAs were extracted using standard methods (1 3), and poly(A) RNAs were isolated 
on oligo(dT) columns. Besides being time consuming, this procedure provides low and variable mRNA amounts, and 

10 cannot be easily scaled down. The alternative procedure described here (use of oligo(dT) 25 covalently linked to par- 
amagnetic beads) is a single tube assay for mRNA isolation from tissue lysate. In our hands, it yields 4-times higher 
mRNA amounts than standard methods. Kits and helpful instructions for mRNA isolation with oligo(dT) beads can be 
obtained from Dynal. Handling of these beads is relatively simple, but care must be taken to avoid centrif ugation, drying 
or freezing, since all three processes are expected to lower their binding capacity. On the other hand, beads can be 

is resuspended by gentle vortexing or pipetting without extreme precautions. 

Protocol 1. mRNA purification 
[0032] Equipment and reagents 

20 

Appropriate tissue or cells. 

Dynabead mRNA direct kit (Dynal, ref. 610-11 ) containing Dynabeads oligo (dT) 25 lysis binding buffer, and washing 
buffers. 

5X reverse transcription (RT) buffer (250 mM Tris-HCI (pH8.3), 375 mM KCI, 15 mM MgCI 2 ), provided with cDNA 
25 synthesis kit (see protocol 2). 

Magnetic Particle Concentrator (MPC) for 1 .5 ml tubes (Dynal, ref. 12004). 
Glycogen for molecular biology (Boehringer Mannheim, ref. 901393). 

[0033] Method 

30 

1 . Lyse the tissue sample in 1 00 uJ lysis binding buffer supplemented with 1 0 u.g glycogen. 

2. Add 20 uJ of Dynabeads in a 1.5 ml tube and condition them according to manufacturer's instructions. 

3. Using the MPC, remove the supernatant from the Dynabeads and add the tissue lysate (1 00 u,l). Mix by vortexing 
and anneal mRNAs to the beads by incubating 10 min at room temperature. 

35 4. Place the tube 2 to 5 min in the MPC and remove the supernatant. The mRNAs are fixed on the beads. 

5. Using the MPC, perform the following washes (all buffers contain 20 u.g/ ml glycogen): twice with 200 u.l washing 
buffer containing lithium dodecyl sulfate (LiDS), 3-times with 200 u.l washing buffer, and twice with 200 u.l ice-cold 
1 X RT buffer. Resuspend the beads by pipetting, transfer the suspension in a fresh 1 .5 ml tube, wash once with 
200 u,l ice-cold 1X RT buffer and immediately proceed to protocol 2. mRNAs on the beads are now ready for 1 st 

40 strand cDNA synthesis. 

1.3 mRNA integrity and purity 

[0034] Before generating a cDN A library, it is generally advised to check for mRNA integrity by Northern blot analysis. 

45 However, this control experiments consumes part of the material, takes several days, and often leads to ambiguous 
results (a variety of reasons can cause poor Northern hybridisation signals). In addition, it is no longer possible when 
using small amounts of tissue or cells. RNA degradation has only to be expected in the three following conditions: 1) 
cell survival is not maintained before lysis or freezing; 2) cell thawing outside of lysis buffer, and 3) use of poor quality 
reagents. Since Rnase-f ree reagents are now available from a variety of company, it is much more rapid and effective 

so to check for survival {i.e. select the appropriate culture medium) and freezing conditions than to perform tricky tests 
on RNA aliquots. 

[0035] The purity of mRNAs isolated with oligo(dT) beads is better than that obtained with conventional methods. 
When we generated SAGE libraries using mRNAs extracted with guanidinium thiocyanate and oligo(dT) columns, 
nuclear encoded rRNAs amounted to 1% of the sequenced tags. Using the alternative mRNA extraction procedure, 
55 rRNAs tags are no longer present in the library. 



7 



EPG0 1 024201 ffile rtCADocuments and Settings\pmiHn^^ 



EP 1 024 201 A1 

2. 1st and 2nd strand synthesis 
2.1. 1st strand cDNA synthesis 



5 [0036] The first step in the synthesis of cDNA is copying the mRNA template into complementary single-strand cDNA. 
In protocol 2, 1 st strand cDNA is synthesised using Moloney Murine Leukaemia Virus RT (M-MLV RT). With this 
enzyme, we have been able to generate SADE libraries from either large or minute amounts of cells (Table 1). In our 
last series of experiments, we have however used Superscript II M-MLV RT, provided with the Superscript cDNA syn- 
thesis kit (Life Technologies, ref . 1 8090-01 9). In this case, the amount of cDNA formed (see 2.3) was increased -4-fold. 

10 Although this better yield likely results from both the synthesis of longer cDNAs (which is not essential for the current 
application) and of a higher number of cDNA molecules, we strongly recommend to use Superscript II M-MLV RT for 
very small samples (<50,000 cells). The protocol will be similar to the one described here, except for reaction volumes 
(20 |il for 1st strand synthesis, and 150 ul for 2nd strand synthesis). 

[0037] mRNAs are generally heated 5 min at 65°C before reverse transcription to break up secondary structures. 
is Since such a high temperature will also denature the mRNA-oligo(dT) 25 hybrid, we only heat the sample at 42°C before 
initiation of 1 st strand synthesis. 



2.2. 2nd strand synthesis 

20 [0038] Many procedures have been developed for 2nd strand cDNA synthesis. The method used here is a modifi- 
cation of the Grubler and Hoffman procedure. Briefly, the mRNA (in the mRNA-cDNA hybrid) is nicked by £ coli RNAse 
H. E. coli ON A polymerase initiates the second strand synthesis by nick translation. E. co/ZDNA ligase seals any breaks 
left in the second strand cDNA The procedure is described in protocol 2. This step is usually very efficient (approxi- 
mately 100%) so that a 2 h-incubation period is sufficient when starting from macroamounts of material (>100 mg of 

25 tissue or 10 7 cells). 

Protocol 2. cDNA synthesis and cleavage 



[0039] Equipment and reagents 

30 

cDNA synthesis kit (Life Technologies, ref. 18267-013) contains all buffers and enzymes necessary for first and 
second strand cDNA synthesis. 
cx[ 32 P]dCTP 6000Ci/mmol (Amersham, ref. AA0075). 
. TEN (10 mM Tris-HCI (pH8.0), 1 mM EDTA, 1 M NaCI). 
35 . Restriction endonuclease Sau3A I 4 U/ul (New England Biolabs, ref. 1 69L), provided with 1 0X reaction buffer and 
purified 100X bovine serumalbumin (BSA, 10 mg/ ml). 
Magnetic Particle Concentrator MPC (Dynal). 
Geiger counter. 

Automated thermal cycler or water-baths equilibrated at 42°C, 37°C, and 16°C. 

40 

[0040] Method 

1 . Resuspend the beads in 12.5 ul of 1 X first strand (i.e. RT) reaction buffer. 

2. Incubate 2 min at 42°C. 

45 3. Place the tube at 37°C for 2 min. Add 1 2. 5 ul of the following mix : 5 ul DEPC-treated water, 2.5 ul 5X first strand 

buffer, 1.25 ul dNTP 10 mM, 2.5 ul DTT 100 mM, 1.25 ul MMLV reverse transcriptase. 
3 Incubate 1 h at 37°C and chill on ice. 

4. On ice, prepare the following mix : 169.7 ul DEPC-treated water, 4.5 ul dNTP 10 mM, 24 ul 2nd strand buffer, 
2 ul o{ 32 P]dCTR 6 ul E. co//DNA polymerase I, 1 .05 ul £ coli RNAse H, 0.75 ul E. co//DNA ligase, 2 ul glycogen 

so 5 ug/ ul. 

5. Add 175 ul to the first strand tube and incubate overnight at 16°C. Keep the remaining mix for subsequent 
measurement of its radioactivity and calculation of dCTP specific activity. 

6. Wash beads to remove non incorporated a[ 32 P]dCTP: 4-times with 200 ul TEN + BSA*, and 3-times with 200 
ul ice-cold 1 X mix Sau3A I + BSA*. Check with Geiger counter that the last eluate is not radioactive, whereas the 

55 material bound on the beads is highly radioactive. 

7. Add on the beads the following mix : 88 ul H 2 0, 10 ul 10X mix Sau3A I, 1 ul 100X BSA, 1 ul Sau3A I. Incubate 
2h at 37° C. Vortex intermittently. 

a Final concentration of BSA: 0.1 mg/ ml. 
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8. Chill 5 min on ice. 

9. Using the MPC, remove the supernatant, which contains the 5' end of the cDNA. Wash once with 200 ul of 1X 
mix Sau3A I + BSA*. Remove this second supernatant, poo! it with the first one, and store the resulting solution 
(300 uJ) in order to measure the yield of second strand synthesis (see text). Before going to step 10, check with 

5 Geiger counter that both the eiuate and beads are radioactive. 

10. Resuspend the beads in 200 jil TEN supplemented with BSA a . 

2.3. Yield of 2nd strand synthesis 

10 [0041] A method to calculate the yield for first and second strand cDNA synthesis is given in the cDNA synthesis kit 
instruction manual. We do not measure the yield of 1st strand cDNA synthesis since, as discussed above (1.3}, this 
implies to set away part of the preparation. 

[0042] The amount of double strand (ds) cDNA formed is calculated by measuring radioactivity incorporation in the 
5' end of the cDNA, which is released in the supernatant after Sau3A I digestion (see Protocol 2). The 300 u.l-supernatant 

is is extracted with PCI and the ds cDNA is ethanol precipitated in the presence of glycogen (50 u.g/ ml) and 2.5 M 
ammonium acetate. The pellet is resuspended in 8 u.l of TE. Half of the material is used for liquid scintillation .counting, 
and the remaining is loaded on a 1.0 or 1.5% agarose gel. For experiments carried out on 250, 30, 4, and 0.5 mg of 
mouse kidney, we obtained the following amounts (u,g) of ds cDNA: 2.8, 0.3, 0.05, and 0.01 . The higher amount cor- 
responds to the incorporation of 1 .3% of the input radioactivity. In these experiments, three of the four cDNA samples 

20 could be detected by ethidium bromide staining after gel electrophoresis (Fig. 2). Their size ranged between <0.2 and 
-3 kbp (the small size of most cDNA fragments is due to Sau3A I digestion). When cDNA amounts are below the 
detection threshold of the ethidium bromide staining method, autoradiographic analysis can be performed. In this case, 
the gel is fixed in 10% acetic acid, vacuum dried and exposed overnight at-80°C with one intensifying screen for 
autoradiography. 

25 

3. Linkers design, preparation, and ligation 
3.1. Linkers design 

30 [0043] A variety of linkers can be used at this point. Linkers must contain three important sequences : a) the appro- 
priate anchoring enzyme overhang; b) a recognition site for a type lis restriction enzyme (tagging enzyme); c) a priming 
site for PCR amplification. High quality linkers are crucial for successful library generation. 

[0044] Table III provides the sequence of linkers and PCR primers used in our experiments. All four linkers must be 
obtained gel-purified. Linkers 1 B and 2B display two modifications: a) 5' end phosphorylation, and b) C7 amino mod- 
35 ification on the 3' end. Linkers phosphorylation can be performed either enzymatically with T4 polynucleotide kinase, 
or chemically at the time of oligonucleotide synthesis. In both cases, phosphorylation efficiency must be tested (Protocol 
3). We use chemically phosphorylated linkers. Linkers modification on the 3' end serves to increase the efficiency of 
ditag formation [protocol 5, step 8-11). Indeed, the modified 3' end cannot be blunt-ended and will not ligate to cDNA 
tags or linkers. 

40 
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Table HI. Sequence of linkers and PCR primers 



Oligonucleotide Sequence 



10 



20 



Linker IA 


5' 


-TTTTGCCAGGTCACTCAAGTCGGTCATTCATGTCAGCACAGGGAC- 3 ' 


a 

Linker IB 


5' 


-GATCGTCCCTGTGCTGACATGAATGACCGACTTGAGTGACCTGGCA- 3 ' 


Linker 2 A 


5' 


-TTTTTGCTCAGGCTCAAGGCTCGTCTAATCACAGTCGGAAGGGAC-3' 


Linker lb* 


5' 


-GATCGTCCCTTCCGACTGTGATTAGACGAGCCTTGAGCCTGAGCAA- 3 ' 


Primer 1 


5' 


-GCCAGGTCACTCAAGTCGGTCATT-3 ' 


Primer 2 


5' 


-TGCTCAGGCTCAAGGCTCGTCTA- 3 ' 



Linkers IB and 2B include two modifications (5 "-phosphorylation and 3'-C7 amino 
modification). 




[004S] With regard to the PCR priming site, it was designed with the help of Oiigo™ software (Medprobe, Norway) 
30 in order to obtain PCR primers with high Tm (60 o C), and avoid self-priming or sense/ antisense dimer formation. Two 
different priming sites must be designed in 1 left " and 1 right " linkers, otherwise the target will undergo panhandle 
formation, and thus escape PCR amplification. 

Protocol 3. Preparing and testing linkers 

35 

[0046] Equipment and reagents 



. Linkers 1 A, 1 B, 2A, and 2B at 20 pmol/ uJ. 
Primers 1 and 2 at 20 pmol/ uJ. 
40 . T 4 DNA ligase 1 U/ uJ (Life Technologies, ref. 15224-017) and 5X reaction buffer. 
. 10 mM ATP. 

PCR reagents: Tag DNA polymerase 5 U/ u.l (Eurobio), 10X PCR buffer (200 mM Tris-HCI (pH 8.3), 15 mM MgCI 2 , 
500 mM KCI, 1 mg/ ml gelatin), 1.25 mM dNTP, 100 mM MgCI 2 , and 100 mM DTT 

Restriction endonucleases Sau3A I (4 U/ uJ) and BsmF I (2 U/ uJ) (New England Biolabs, refs. 169L and 572L), 
45 provided with 10X reaction buffer and 100X BSA. 

1 0-bp DNA ladder (Life Technologies, ref. 1 0821 -01 5) 

Automated thermal cycler and water baths equilibrated at 14°C, 37°C, and 65°C. 
. Tris-HCI buffered (pH 7.9) phenol-chloroform-isoamyl alcool (PCI). 
10 M ammonium acetate. 
50 . TE (10 mM Tris-HCI (pH8.0), EDTA 1 mM). 

[0047] Method 

1. Mix 25 uJ of linker 1 A and 25 u.l of linker 1B in a 0.5 ml PCR tube (final concentration: 10 pmol/ uJ). Proceed 
55 similarly for linkers 2A and 2B. 

2. Transfer PCR tubes in the thermal cycler. Heat at 95°C for 2 min, then let cool at room temperature for 20 min 
on the bench. Store at -20°C. 

3. Test self-ligation of each hybrid , as well as ligation of hybrid (1A/1B) with hybrid (2A/2B). Set-up 3 ligation 
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reactions by mixing 1 ul of hybrid (1 N 1 B) (tube 1), 1 ul of hybrid (2N 2B) (tube 2), 0.5 ul of hybrid I (1 A/1B) and 
0 5 ul of hybrid (2A/ 2B) (tube 3) with 2 ul 10 mM ATP, 4 ul 5X ligase mix, 12 ul H20, and 1 ul T4 DNA hgase. 
4^ Incubate 2 h or overnight at 14"C. Analyse 10-ul aliquots on a 3% agarose gel using 10-bp DNA ladder as 
marker. Most of the material (>80%) consists of a 94-bp DNA fragment. 

5 Proceed to PCR using 10* targets from tube 3 reaction (dilute using TE buffer supplemented with 0.1 mg/ ml 
BSA) Mix 1 ul of diluted ligation product, 5 ul 10X PCR buffer, 1 ul 100 mM MgCl 2 , 8 ul 1.25 mM dNTP, 2.5 ul 
primer 1 , 2.5 ul primer 2, and 30 ul water. Prepare 4 such reactions and a control tube without linker, transfer in 
the thermal cycler and heat at 80°C for 2 min. 

6. Add in each tube 50 ul of Taq polymerase amplification mix (5 uJ 10X PCR buffer, 4 ul 100 mM DTT, 0.5 ul Taq 
polymerase 40 5 ul water), and 60 ul of mineral oil if necessary for your thermal reactor. 

7. Perform 29 PCR cycles (95°C, 30 s; 58°C, 30 s; 70°C, 45 s), followed by an additional cycle with a 5-mm 
elongation time. 

8 Analyse 10-uJ aliquots on a 3% agarose gel. A 90-bp amplification fragment is clearly visible. 

9 Pool all 4 PCR samples, extract with equal volume of PCI. Transfer the aqueous (upper) phase in a fresh tube, 
then add 100 ul 10 M ammonium acetate and 500 ul isopropanol. Round the tubes several times for mixing, 
centrifuge (15,000g) at 4°C for 20 min, wash twice with 400 ul 75% ethanol, vacuum dry, and resuspend the pellet 

ia Set- up two 50-ul digestion reactions using 5 ul of DNA and 4 U of Sau3A I or BsmF I. Incubate 1 h at 37°C or 

65°C, as appropriate. f . _ _ . 

11 . Analyse 10-ul aliquots on a 3% agarose gel. Run in parallel 1 ul of uncut PCR product. Sau3A I and BsmF I 

digestion must be completed to > 80%. 

3.2. Linkers preparation 

[00481 It is essential to check that ds linkers can be ligated, PCR amplified, and digested with the anchoring and 
taqging enzyme Success with protocol 3 experiments is a prerequisite before attempting to prepare a library. The PCR 
conditions described here have been optimised for Hybaid thermal reactors (TR1 and Touch Down) working under 
control or simulated tube conditions. Different conditions may be used with other machines. Note that since the target 
is quite small (90 bp), elongation is performed at a relatively low temperature. 

3.3. Ligating linkers to cDNA 

[0049] The concentration of ds linkers should be adapted to the amount of cDNA used to prepare the library. In the 
original protocol of Velculescu et al, 2 ug (74 pmol) of ds linkers are used. Considering that starting from 5 ug of 
mRNAs 2 uo, of cDNAs with 1 -2 kb average size are obtained, the amount of cDN A available for ligation is in the range 
of 1 5-3 pmol Since a large excess of linkers decreases the PCR signal to noise ratio, we perform ligation with 8 pmol 
ds linkers for libraries generated from 250 mg of tissue (-5 uxj mRNAs). Starting from 5X104-10* cells (10-40 ng 
mRNAs), 0.5 pmol of ds linkers are used. A lower amount of linkers may allow efficient ligation, but we have no expe- 
rience for it. 

Protocol 4. Ligating ds linkers to cDNA 
[0050] Equipment and Reagents 

Hybrid (1 A/1 B) and (2A/2B) at 0.5 pmol/ uJ, obtained from protocol 3, steps 1 -2. 
TEN, TE, and LoTE (3 mM Tris-HCI (pH7.5), 0.2 mM EDTA), stored at 4°C. 
10X NEB IV reaction buffer and 100X BSA (New England Bioiabs). 
T 4 DNA ligase 5 U/ ul (Life Technologies, ref . 1 5224-041 ) and 5X ligation mix; 1 0 mM ATP. 
MPC (Dynal). 

Water-baths equilibrated at 45°C and 16°C. 
Geiger counter 

[0051] Method 

1 Once the experiments described in protocol 2 have been carried out, perform 2 additional washes of the beads 
before ligating ds linkers to the cDNA. Using the MPC, wash the beads with 200 ul of TEN + BSA.* Resuspend 
the beads in 200 ul of the same buffer (take care to recover the beads completely: mix by repeated pipetting and 

a Final concentration of BSA: 0.1 mg/ ml. 
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scrape the tube wall with the pipette tip), then separate into two 100 u>aliquots: one will be ligated to hybrid (1 N 
1 B), the other will be ligated to hybrid (2A/2B). 

2. Add 1 0 |il of fresh Dynabeads in two 1 .5 ml tubes. These tubes will be now treated as the two others and will 
be used as negative control. 
5 3 Wash twice the 4 tubes with 200 uJ of ice-cold TE buffer + BSA a . 

4 Immediately after the last rinsing, add to each tube 34 ul of the appropriate mix containing 8 u.l of 5X ligase 
buffer and 0.5 pmol of hybrid (1 A/1 B) or hybrid (2A/2B). Heat 5 min at 45*C then chill on ipe. 

5 Add in each tube 4 ul of 1 0 mM ATP, and 2 ul of T 4 DN A ligase (final volume: 40 ul). Incubate overnight at 1 4 C 

6 Wash beads thoroughly (free linkers will poison the PCR amplification) as follows: 4-t.mes with 200 ul of TEN 
10 + BSA a and 3-times with 200 ul of 1 X NEBIV + BSA a . After the first rinsing with NEB IV, take care to resuspend 

completely the beads (see above) and transfer them to fresh tubes. After the last rinsing, check that rad.oactrv.ty 
is still present on the beads, but absent from the supernatant. 
7. Proceed to protocol 5 or store at 4°C. 

is ro052] After ligation (step 5 in protocol 4), it is very important to wash the beads extensively in order to remove free 
ds linkers In fact if ds linkers not ligated to cDNA fragments are not thoroughly eliminated from each sample, the 
library will contain large amounts (up to 25%) of linkers sequences. This will make data acquisition poorly efficient. 



4. Ditags formation 

4.1. Release of cDNA tags 



r0053] Digestion with the tagging enzyme (BsmF I) will release only small DNA fragments from ol.go(dT) beads. 
Consequently, much of the radioactivity remains bound to the beads at this stage. In order to check that extensive 

25 rinsing did not cause great loss of material, we usually measure beads radioactivity by Cerenkov counting (the data 
must be corrected for the efficiency of Cerenkov counting (-50 % of liquid scintillation counting efficiency) after BsmF 
I diqestion For experiments previously described on 250, 30, 4, and 0.5 mg of mouse kidney, the amounts of ds cDNA 
remaining on the beads reached 450, 67, 13, and 1.8 ng, respectively. Comparison of these data with those dealing 
with Sau3A l-released fragments (2.3) indicates that ~6 times lower cDNA amounts are recovered on beads that on 

30 Sau3A l-supernatants. The average size of Sau3A Kut fragments is predicted to be 256 bp. The fraction that remains 
bound on the beads after Sau3A I digestion thus suggests that the average length of cDNA formed is -1 .5 kb, which 
seems quite reasonable. 

r00541 The whole amount of BsmF l-released material is used for ditag formation, and we never attempted to quantify 
it. Nevertheless, the efficiency of BsmF I digestion can be checked when £ 4 mg of tissue is used for library generation. 
35 in this case, a Geiger counter allows to detect radioactivity in the BsmF l-supematant. 



4.2. Blunt ending of released cDNA tags 

[0055] Different enzymes may be used for blunt ending BsmF l-released tags. In their original study, Velculescu et 
40 al (7) carried out the blunt ending reaction with T4 DNA polymerase. In more recent applications, Klenow DNA polymer- 
ase was used (8) and is now recommended. It is also our experience that the success in library generation is very poor 
using T4 DNA polymerase. This likely comes from the fact that blunt ending with T4 DNA polymerase is carried out at 
1 1 °C (1 2) Such a low temperature allows protruding termini from unrelated cDNA tags to hybridize, and is thus expected 
to markedly decrease the amount of material available for the blunt ending reaction. We have successfully used Vent 
45 and sequencing grade T7 DNA polymerases to generate blunt ends. The procedure described in protocol 5 involves 
T7 DNA polymerase. 

Protocol 5. Release, blunt ending, and ligation of cDNA tags 



so [0056] Equipment and reagents 

BsmF I, 1 0X NEB IV buffer and 100X BSA. 
, PCI. 

10 M ammonium acetate. 
55 , Sequencing grade T7 DNA polymerase (Pharmacia Biotech, ref. 27098503). 
! 5X mix salt (200 mM Tris-HCl (pH 7.5), 100 mM MgCI 2 , 250 mM NaCI). 
. 2 mM dNTP. 

T4 DNA ligase (5 U/ ul) and 5X reaction buffer. 
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1 00% ethanol, 75% ethanol. 
Geiger counter. 
. Water-baths equilibrated at 65°C, 42°C, and 16°C. 

5 [00S7] Method 

1 . Remove supernatant and immediately add on the beads 100 ul of the following mix: 87 ul H 2 0, 10 ul 10X NEB 
IV, 1 ul 100X BSA, 2 ul BsmFI. 
2 Incubate 2 h at 65°C. Vortex intermittently. 
10 3 Chill 5 min at room temperature, collect the supernatant (which contains the ditags) and wash beads twice with 

75 ul of ice-cold TE + BSA*. Pool all 3 supernatants (250 ul final volume) and add 60 u-g glycogen to each of the 
4 reaction tubes. Measure the radioactivity still present on the beads by Cerenkov counting (see text). 

4 Add 250 ul (1 volume) PCI to all 4 supernatants. 

5 Vortex then centrifuge (10,000g) 10 min at 4°C. Transfer the upper (aqueous) phase to a fresh tube. 

is 6. Precipitate with high ethanol concentration: add to the aqueous phase 1 25 ul 1 0 M ammonium acetate, 1 . 1 25 

ml 100% ethanol, and centrifuge (15,000g) 20 min at 4°C. . 

7 Wash the pellet twice with 400 ul of 75% ethanol. Vacuum dry and resuspend the pellet in 10 ul LoTE. 

8 Add 15 uJ of 1X mix salt on each tube and heat 2 min at 42°C. Maintain tubes at 42*C and add 25 jal of the 
following mix- 7 5 ul H 2 O t 5.5u!100 mM DTT, 11 uldNTP mix, 1 ul T7 DN A polymerase. Incubate 10 min at42°C. 

20 9 Pool together tags I igated to hybrid( 1 A/1 B) and hybrid(2A/2B). Rinse the tubes with 1 50 ul LoTE + 20 ug glycogen 

and add this solution to the pooled reactions (final volume: 250 ul). You have now 2 tubes (1 sample, 1 negative 

control). _ ... 

10. Extract with equal volume of PCI and high concentration ethanol precipitate (see steps 4-6). Resuspend the 

pellet in 6 ul LoTE. ^< , * ta 

25 11 Ligate tags to form ditags by adding to the 6 ul-sample: 2 ul 5X mix ligase, 1 ul 10 mM ATP, and 1 ul of T4 

DNA ligase (5 U/ ul). Proceed similarly for the negative control, incubate overnight at 16»C, then add 90 ul LoTE. 

5. PCR Amplification 

30 [0058] Considering the linkers and primers used in our studies, the desired PCR product is 110-bp long (90 bp of 
linkers derived sequences, and 20 bp of ditag). 

5.1. PCR buffers and procedures 

35 [0059] For PCR amplification of ditags, we use buffers and conditions different from those described by Velculescu 
et al.: 

(a) amplification is performed with standard PCR buffers without DMSO and p-mercaptoethanol. Composition of 
our 10X PCR buffer is given in protocol 3. Promega buffer (ref. M1901) works equally well. The conditions used 

40 in our assay are as follows: 100 uM dNTP, 2.5 mM MgCI 2 (2 mM with Promega buffer), 0.5 uM primers, and 5 U 

Taq polymerase. High amounts of primers and Tag polymerase are used (standard reactions are generally per- 
formed with 0 1 uM primers and 1 .25 U of enzyme) to ensure a high yield of ditags production. dNTP concentration 
is also slightly higher (1 00 vs. 50 uM) than for standard PCR amplifications. Very high dNTP concentrations should 
nevertheless be avoided since these are known to increase Taq polymerase-dependent misincorporations. 

45 (b) as suggested initially (7), we still perform small scale PCR, purify the 110-bp fragment, then submit it to pre- 

parative PCR. 

5.2. Number of PCR cycles 

so [0060] Before starting protocol 6, the optimal number of PCR cycles needs to be determined. This is best accom- 
plished by performing duplicate PCR on 2% of the ligation product and sampling 7-ul aliquots at different cycles. The 
number of cycles will of course depend on the amount of starting material. For 250 mg tissue pieces, a PCR signal 
should be obtained with 18 cycles, and the plateau reached at 22-23 cycles. The 110-bp fragment should be largely 
predominant (amplified products of 90 and 100 bp are not unusual). Examples of PCR carried out on ditags generated 

ss from tiny amounts of cells (15,000 to 45,000) are given in Fig. 3. Using such low amounts of cells, the 110-bp product 
is no longer predominant. Nevertheless, if maximal yield is achieved with less than 30 cycles (as obtained from 45,000 
cells in Fig. 3), a library which is fairly representative of the tissue can be generated. Small scale PCR (10 reactions, 

a Final concentration of BSA: 0.1 mg/ ml. 
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step 1-5 in protocol 6) is performed on 2 ul and 4 ul aliquots of ligation product for macro and microamounts of tissue, 
respectively. 

Protocol 6. PCR Amplification of Ditags 

[0061 ] Equipment and reagents 

Automated thermal reactor (Hybaid). , rtV < „ c kJI * Km 

PCR reagents: Taq polymerase, Primers 1 and 2 at 20 pmol/ uJ, 10X PCR buffer (see protocol 3), 1 .25 mM dNTP, 

100 mM MgCI 2 , and 100 mM DTT. 

p-Agarase and 10X reaction buffer (New England Biolabs, ref. 392L). 
Sau3A I reaction buffer and 100X BSA. 

Low melting point (LMP) agarose (Life Technologies, ref. 15517-022). 
TBE 10X (1.12 M Tris, 1.12 M boric acid, 20 mM EDTA) 

Vertical gel electrophoresis unit, with 20X20 cm plates, 1 .5mm thick spacers, and preparative comb. 
10-bp DNA ladder. 

Bromophenol blue loading buffer (0. 1 25% bromophenol blue, 1 0% ficoll 400, 1 2.5 mM EDTA) filtered on 0.45-um 
membrane. 

20 [0062] Method 

1 Prepare a master mix with the following reagents : 5 uJ 1 0X PCR Buffer, 1 u.1 1 00 mM MgCl 2 , 8 ul 1 .25 mM dNTP, 

2 5 uJ primer 1 2 5 ul primer 2, 27 ul H 2 0 (multiply these quantities by the number of reactions tubes (usually 1 2)). 
Dispense equal aliquots (46 ul) into PCR tubes and add 4 ul of DNA sample (10 tubes), negative control, or H 2 0. 

2 Transfer the tubes in the thermal reactor and heat 2 min at 80°C (hot start conditions). 

3. Add in each tube 50 ul of the following mix: 5 ul 1 0X PCR Buffer, 4 ul 1 00 mM DTT, 40 ul H 2 0, 1 ul Taq Polymerase. 
Add a drop of mineral oil if necessary according to your thermal cycler. 

4. Perform PCR at the following temperatures : 30 sec at 94°C, 30 sec at 58°C, and 45 sec at 70°C (27-30 cycles), 
followed by one cycle with an elongation time of 5 min. 

5 Analyse an aliquot of each tube (7 ul) on a 3% agarose gel using 10-bp DNA ladder as marker. 

6 If yield is satisfactory (see Fig. 3), pool the 10 PCR tubes in two 1 .5 ml tubes. Add 30 ug glycogen in each tube, 
extract with PCI Recover the aqueous phase and precipitate the DNA by centrifugation after adding 0.1 volume 

3 M sodium acetate and 2.5 volumes 100% ethanol. Wash the pellet with 75% ethanol, vacuum dry, and resuspend 
in 300 ul LoTE. Add 75 ul of bromophenol blue loading buffer. 

7. Electrophorese the PCR product through a 3% LMP agarose vertical gel (warm plates 15 min at 55°C before 
pouring the gel). Run until bromophenol blue has reached bottom of the gel (-3 h). 

8 Cut out the 1 10-bp fragment from gel and place agarose slice in a 2 ml tube. Add 0.1 volume 10X 13-agarase 
mix heat 10 min at 70°C, then 1 0 min at 40°C, and add p-Agarase (6 U/ 0.2g of agarose). Incubate 1h30 at 40°C. 
Add 30 ug glycogen. Extract with PCI and ethanol precipitate as indicated in section 6. Resuspend the pellet in 
300 |il LoTE. 

9. After determination of the optimal number of PCR cycles (usually 12), perform large scale PCR (140-150 reac- 
tions) using 2 ul of DNA and the protocol described in sections 1-4. 

10. Pool PCR reactions in 2 ml tubes. Extract with PCI, ethanol precipitate (section 6) and wash the pellet twice 
with 75% ethanol. Resuspend the dry pellets in a final volume of 470 ul 1X mix Sau3A I. 



5.3. Purification and reamplfflcation 

[0063] The 1 10-bp PCR product can be purified either on a 12% polyacrylamide (7) or a 3% agarose slab gel. To 
avoid overloading and achieve efficient purification, pool no more than 10-12 PCR reactions on an agarose gel and 
slice agarose as close as possible to the 1 10-bp fragment. Purification and optimal number of PCR cycles should then 
be tested on duplicate 2 ul aliquots of the purified product. A single band of 110 bp should now be obtained. The 
absence of interference from other amplified products is essential to produce large amounts of the 110-bp fragment. 

6. Ditags isolation, concatenation, and cloning of concatemers 

6.1. Ditags isolation 

[0064] Two important points need to be addressed for ditags purification. First, since the total mass of linkers is nearly 
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five times that of ditags, a highly resolutive polyacrylamide gel is required to thoroughly purify ditags. Second, the short 
length of ditags makes them difficult to detect on gel by ethidium bromide staining. This problem can be overcome by 
staining the gel with SYBR Green I (or equivalent products) which ensures a lower detection threshold than ethidium 
bromide (0.1 instead of 2 ng DNA). To obtain high sensitivity, loading buffer should not contain bromophenol blue 
(bromophenol blue comigrates with ditags). The gel is stained after migration in a polypropylene or PVC container. 
[0065] Ditags do not run as a single band on polyacrylamide gel. This may come from subtle effects of base com- 
position on electrophoretic mobility and/ or some wobble for BsmF I digestion (7). We cut out from gel all the material 
ranging from 22 to 26 bp. The elution procedure is labour intensive but provides ditags that can be concatenated 
efficiently. 

6.2. Concatenation 



[0066] Starting from 1 50 PCR reactions, at least 1 u,g ditags should be obtained. The optimal ligation time depends 
on the amount of ditags and on the purity of the preparation. We usually perform ligation for 2 h. When yield is high (> 
15 1 .4 ng), we set up two ligation reactions, and allow them to proceed for 1 or 2 h. The corresponding concatemers are 
then separately purified on a 8% agarose gel. 



Protocol 7. Ditags isolation and concatenation 
20 [0067] Equipment and Reagents 



. Sau3A I, 10X reaction buffer and 100X BSA. 

. 50X TAE (2 M Tris, 57% glacial acetic acid, 50 mM EDTA) 

12% polyacrylamide gel :53.6 ml H 2 0, 24 ml 40% acrylamide (19:1 acrylamide:bis), 1 .6 ml 50X TAE, 800 ul 10% 
25 ammonium persulfate, 69 \i\ TEMED. 

10-bp DNA ladder. 

SYBR Green I stain (FMC Bioproducts, ref. 50513). 
T4 DNA Ligase 5 U/ uJ and 5X ligation mix; 10 mM ATP. 

Vertical gel electrophoresis unit, with 20X20 cm plates, 1 5mm thick spacers, and preparative comb. 
30 . Xylene cyanole loading buffer (0.125% xylene cyanole, 10% ficoll 400, 12.5 mM EDTA). 
. Spin X microcentrifuge tubes (Costar, ref. 8160) 



[0068] Method 

35 1 . Save 1 ul of the 1 10-bp DNA fragment (section 10 of protocol 6) and digest the remaining by adding 5 \i\ 100X 

BSA and 25 uJ Sau3A I. Incubate overnight at 37°C in hot-air incubator 

2. Check for Sau3A I digestion: analyse 1 ul of uncut DNA, 1 uJ and 3 uJ of Sau3A I digestion (use bromophenol 
blue loading buffer and xylene cyanole loading buffer for uncut and Sau3A l-digested DNA, respectively) on a 3% 
agarose gel. Most (>80%) of the 11 0-bp fragment has been digested, and a faint band, corresponding to the ditags 

40 can now be detected at -25 bp. 

3. Add 125 u.l xylene cyanole loading buffer to the digested DNA sample and load on a preparative 12% polyacr- 
ylamide vertical gel in 1 X TAE. Run at 30mA until bromophenol blue of the size marker is 1 2 cm away from the well. 

4. Transfer the gel in SYBR Green I stain at 1:10,000 dilution in 1X TAE. Wrap the container in aluminium foil and 
stain gel for 20 min. Visualise on UV box. 

45 5. Cut out the ditags band (24-26 bp) and transfer acrylamide slices in 0.5 ml tubes (for a 20-cm wide gel use 8 

tubes). Pierce the bottom of 0.5 ml tubes with a 18-gauge needle. Place the tubes in 2 ml tubes and spin 5 min at 
10,000 g. Prepare the following elution buffer for each tube : 475 u.1 LoTE, 25 u.l 10 M ammonium acetate, 5 u.g 
glycogen. Add 250 uJ elution buffer in each 0.5 ml tube and centrifuge again. Discard 0.5 ml tubes and add 250 uJ 
elution buffer directly in each 2 ml tube. Incubate overnight at 37°C in hot-air incubator. 

so 6. Prepare a series of 1 6 SpinX microcentrifuge tubes: add 20 u.g glycogen in each collection tube. Transfer content 

of each 2 ml tube (-600 u.1) to two SpinX microcentrifuge tubes. Spin 5 min at 1 3,000 g. Transfer 350 ul of eluted 
solution into 1.5 ml tubes (10-11 tubes), extract with PCI, perform high concentration ethanol precipitation. Wash 
twice with 75% ethanol, vacuum dry, and pool all pellets in 15 u.l LoTE. 

7. Measure the amount of purified ditags by dot quantitation (12) using 1 uJ of sample. Total DNA at this stage is 
55 usually 1 ug, but a library can still be generated with 400ng. 

8. Ligate ditags to form concatemers: add to your sample (14 ul) 4.4 ul 5X mix ligase, 2.2 uJ 10 mM ATP, and 2.2 
\l\ concentrated (5 U/ T4 DNA ligase. 

9. Incubate 2h at 16°C. Stop the reaction by adding 5 uJ of bromophenol blue loading buffer and store at -20°C. 
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6.3. Purification of concatemers 

r0069] Concatemers are heated at 45°C for 5 min immediately before loading on gel to separate unligated cohesive 
ends Concatemers form a smear on the gel from about 1 00 bp to several kbp (Fig. 4) and can be easily detected using 
SYBR Green I stain. All fragments >300 bp (i.e. with 25 or more tags) are potentially interesting for library constructioa 
We usually cut out fragments of 350-600, 600-2000, and >2000 bp and generate a first library using 600-2000 bp DNA 
fragments. Longer fragments will be more informative but are expected to be cloned with poor efficiency. 

Protocol 8. Purification and cloning of concatemers 

[0070] Equipment and reagents. 

SYBR Green I stain 

Vertical gel electrophoresis unit, with 20X20 cm plates, 1 .5mm thick spacers, and 20-well comb. 
8% polyacrylamide gel : 61 .6 ml H 2 0, 16 ml acrylamide 40% (37.5:1 acrylamide:bis), 1 .6 ml SOX TAE, 800 ul 10 /o 
ammonium persulfate, 69 uJ TEMED. 
100-bp DNA ladder (Life Technologies, ref. 15628-019) 

T 4 DNA ligase 1 U/uJ (Life Technologies, ref. 15224-017) and 5X reaction buffer. 
10 mM ATP. 

pBluescript II, linearized with BamH I and dephosphorylated. 
£ coli XL2 Blue ultracompetent cells (Stratagene, ref. 200150). 



[0071] Method 

25 1 . Heat sample 5 min at 45°C and load into one lane of a 20 wells 8% acrylamide gel. Run at 30 mA until bromophe- 

nol blue is 10-12 cm from the well. 

2 Stain the gel with SYBR Green I as described in protocol 7and visualise on UV box. 

3 Concatemers form a smear on gel with a range from about 100 bp to the gel well (Fig. 4).Cut out regions 
containing DNA of 350-600, 600-2000, and >2000 bp. Purify separately DNA of each three slices as described in 

30 section 5-6 of protocol 7 (a 1-h incubation period of gel slices in LoTE/ammonium acetate solution is sufficient). 

Resuspend the pellet in 6 u,l LoTE and generate a first library using concatemers of 600-2000 bp. 

4 Mix 6 ul of concatemers and 2 ul (25 ng) of BamH l-cut pBluescript II. Heat 5 min at 45°C then chill on ice. 

5" Add 3 ul 5X mix ligase, 1 ul H 2 0, 1.5 ul 10 mM ATP, and 1.5 ul T4 DNA ligase (1 U/ ul). Mix and incubate 

35 a Add 9 20 tig glycogen and 285 ul LoTE and extract with PCI. Ethanol precipitate, wash twice with 75% ethanol, 

vacuum dry, and resuspend the pellet in 12 ul LoTE. 

7 Transform E coli XL2 Blue ultracompetent cells with 1/3 (4 ul) of ligation reaction according to the manufacturers 
instructions Plate different volumes (5 ul, 10 ul, 20 ul, 40 ul) of transformation mix onto Petri dishes containing 
Luria agar supplemented with ampicillin, X-gal, and IPTG. Incubate 15-16hr at 37°C. Save the remaining (-900 

40 ul) transformation solution (add 225 ul 80% glycerol, mix intermittently for 5 min, and store at -80°C). It will be 

used to plate additional bacteria if library appears correct. 

8 Count insert-free (/ e. blue) and recombinant (i.e. white) bacterial colonies on each plate. The fraction of recom- 
binant colonies should be >50%, and their total number should be in the range of 10,000-60,000 for 1 ml of trans- 
formation mix. 



45 



6.4. Cloning of concatemers 



[0072] Concatemers can be cloned and sequenced in a vector of choice. We currently clone concatemers in pBlue- 
script II linearized with BamH I and dephosphorylated by calf intestinal alkaline phosphatase treatment. Any kind of 

50 vector with a BamH I site in the multiple cloning site will be suitable. Velculescu et al.(7, 8) use pZero-1 from Inv.trogen 
which only allows recombinants to grow (DNA insertion into the multiple cloning site disrupts a lethal gene). The com- 
petent cells and transformation procedures (heat shock or electroporation) can also be changed according to your 
facilities Whatever your choice, it is important to use bacterial cells allowing very high cloning efficiency (> 5X1 0* 
transformants/ ug of supercoiled DNA). An important point is to evaluate the number of clones (protocol 8, step 8) 

55 obtained in the library. Since a large number (1 ,000-2,000) of clones will be sequenced, the total number of recom- 
binants should be > 10,000. 

[00731 Library screening can be performed by PCR or DNA miniprep. In our hands, DNA miniprep provides more 
reproducible amounts of DNA than PCR, and avoids false positive signals. We use Qiaprep 8 miniprep kit (Qiagen, 
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ref. 27144) which enables to perform 96 minipreps in -2 hours. Plasmid DNA is eluted from Qiagen columns with 100 
Ul of edition buffer; 5 pi are digested to evaluate insert size, and if insert is > 200 bp, 5 pi are d.rectly used for DNA 
sequencing. 

s 7. Data analysis 

7.1. Software 

[0074] Once cloned concatemers have been sequenced, tags must be extracted, quantified, and identified through 

10 possible data bank matches. Two softwares have been written to reach these goals. 

r00751 SAGE software (7) was written in Visual Basic and is operating on personal computers through he Microsoft 
Windows system. It extracts tags from text sequence files, quantify them, allows to compare several libraries, and 
provides links to GenBank data downloaded from CD-ROM flat files or over the Internet. The latter function enables 
rapid identification of tags originating from characterised genes or cDNAs. However, description of EST sequences .s 

.5 truncated, which constrains to look for indrvidual GenBank reports. SAGE software also includes several Simula <ng 
tools which allows, for example, to assess the significance of differences obseived between two librar.es, and to eval- 
uate the sequencing accuracy. . , i ,,„,,. „-i 

r00761 The second software is currently developed at the University of Montpellier-2 (France) by J. Mart, and co- 
workers as part of a database (CbC, for Cell by Cell) intended to store and retrieve data from SAGE experiments 

20 Scripts for extraction of data are developed in C language under Unix environment and the database management 
system implemented in Acces®. Text files are concatenated to yield the working file from which tag sequences are 
extracted and enumerated. Treatment of raw data involves identification of vector contaminants, truncated and repeated 
ditags (see below). For experiments on human, mouse and rat cell samples, the tags are searched in the non-redundant 
set of sequences provided by the UniGene collection. These data can be loaded from the anonymous FTP site: neb,. 

25 nlm gov/repository/unigene/. Useful files are Hs.data.Z and Hs.seq.uniq.Z for man, and similar files for mouse and rat. 
The results are displayed in a table which provides the sequence of each tag, its number of occurrences with the 
matching cluster number (Hs# for homo sapiens, Mm# for Mus musculus and Rn# for Rattus norvegws) and other 
data extracted from the source files, including GenBank accession numbers. For human genes, when ava.lable a link 
is automatically established with GeneCards, (http://bioinfo.weizmann.ac.il/cards/) allowing to get additional mforma- 

30 tion. 

7.2. Library validity 

7.2.1. Inserts length 

[0077] Initial assessment of library quality will be obtained from screening for inserts length. A good library should 
contain >60% clones with inserts > 240 bp (20 tags). Only one DNA strand is sequenced, since accuracy is obtained 
from the number of tags recorded, rather than from the quality of individual runs. Depend.ng on your budget and 
sequencing facilities, either all clones or only the most informative ones will be sequenced. It should be noted that the 

40 average length of inserts does not fit with that of gel-purified concatemers. Although we usually extract 600-2000 bp 
long concatemers, most of the clones have inserts <600 bp, and we never get inserts >800 bp. A number of reasons 
can explain such a paradoxical result. Indeed, long inserts are known to be cloned with poor efficiency In addition, 
they are expected to contain several repeats or inverted repeats, and may thus form unstable plasmid construct^ 
Supporting this interpretation, it has been demonstrated (14), and we have also observed, that efftc.ent removal o 

45 linkers (which represents up to 20% of total tags in poor libraries) increases the average length of cloned .nserts. At 
any rate, it is worth to emphasise that similar biologteal information is obtained from libraries with short and long inserts. 

7.2.2. Gene expression pattern 

so [0078] The basic pattern of gene expression in eukaryotic cells have been established long ago by kinetics analysis 
of mRNA-cDNA hybridization (15, 16). In a -typical' mammalian cell, the total RNA mass consists of 300,000 molecules 
corresponding to -12,000 transcripts which divide into three abundance classes. Avery small number of mRNAs (-10) 
are expressed to exceedingly high levels (3,000-15,000 copies/ cell). A larger number of mRNAs (-500) reaches an 
expression level in the range of 100-500 copies/ cell. Finally, the majority of mRNAs (>10,000) are poorly expressed 

ss (10 100 copies/ cell). This basic pattern should be observed in SAGE or SADE libraries. However, before translatmg 
tags abundance in a definite gene expression profile, the data must be scrutinised for artefacts encountered .n library 
construction. 
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7.2.3. Occurrence of linker-derived sequences 

[0079] As mentioned above, some libraries display a high amount of linker sequences. If this amount is . 2CW( , or more 
sequencing will be quite expensive, and it is better to start again from the RNA sample. L.brary contam,nat»n *h 

5 oSS linker sequences is acceptable, 5-10% is good, and < 5% is excellent. In add-on to the two P^^hnk^r 
matches (GTCCCTGTGC, and GTCCCTTCCG), reading ambiguities can lead to sequences with one mismatch .These 
ESe sequences are also easily identified since, assuming efficient enzymatic cleavage, the probability of having 
adiacent Sa^u3A I and BsmF i sites in the conca.emers is normally zero. Linker and linker-iike sequences can be 
auto^aticaHy discarded using SAGE or CbC software, and their relative amounts can be used to evaluate the sequenc- 

io ing accuracy (see 7. ?.). 

7.2.4. Duplicate ditags 

rOOSOl Another category of sequences that must be deleted are those corresponding to duplicate ditags. Indeed, 
excep to/pecuL tissue? (e.g. lactating gland or laying hen oviduct) in which one or a veor small number of transcripts 
3 °es P ,he b U ,k of the mRNA mass, the probability for any two tags to be found several times in the same ditag is 
very smalf Elimination of repeated ditags will therefore correct for preferential PCR amplification of some targets, and 
or piling several bacteria, colonies originating from the same clone. Most ditags (>95%) general* «*ur ^nly once 
when the hbrary is constructed from macroamounts of tissue. For microlibraries, the percent of unique drtags ,s generally 
ZTmnZ no ionger compatible «75%) with efficient data acquisition, it is 

first (small scale) PCR (see pmtocol 6). Duplicate ditags are automatically retrieved from the sequence files by SAGE 
and CbC softwares. 
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7.3. Number of tags to be sequenced 



r0081l The number of tags to be analysed will obviously depend on the application and tissue source. As a matter 
S fa reduci g , he tissue complexity through isolation of defined eel, populations will allow to markedly r d.m.n«h the 
minimum number of tags for accurate analysis, and to better correlate molecular and phys.olog.cal phenotypes. 
S Delineation of the most expressed genes (>500 copies/ cell) in one tissue and ^T^^SSZ 
30 sion level in another one. will require to sequence only a few thousand tags (F.g. 5). Analysing 5 000 tags, 300 w be 
detected at least 3 times. Since automated sequencers can read 48-96 templates simultaneously, 1 0,000 tags w.ll be 
recorded from 5-10 gels if the average number of tags/ clone is -20. 

S ThTmost difficult projects are those aiming to compare gene expression profiles in the same fssue under 
rphysiologicalorpathologicalconditions. Differentially expressed genes could belong to any of the three abundance 

35 c 3s and furthermore, they can be either up- or down-regulated. A reasonable number of tags to be sequenced 
Sd be in the range of 30,000-50,000. The probability (P) of detecting a sequence of a given abundance can be 
Sculated from the Clarke and Carbon (17) equation (N=ln(1-P)/ln(1-x/n)), where N is the number of sequence in- 
^ expression level, and n is the total number of mRNAs per cel. (-300,000). Tnus^he anaiys m^OJM 
and 50,000 tags will provide a 95% confidence level of detecting transcripts expressed at 30 and 18 copies/ ce L 

40 respectively Most up-regulation processes will be therefore assessed. For example, tags corresponding to poorly 
expressed trans " ipts may be detected 1 and * 5 times in control and experimental conditions, respectively. However, 
we have Jo Z i avlre that the possibilrty of assessing down-regulation processes will be less exhaustive. It will only 
concern tags present > 5-10 times in thecontrol condition, which excludes from the analysis part of the poorly expressed 

45 loS riP Tn the here above Table I, which corresponds to the characterisation of the most abundant nuclear transcripts 
n the mouse outer medullary collecting duct (OMCD) and establishes their differential expression ,n the medutery 
thick ascending limb (MTAL), the two left columns correspond to the data illustrated in Figure 5, and prov.de the abun- 
dTnce o eac tagln the two libraries. The third column provides the sequence of the tags. The right co umn ,nd.ca,es 
results of individual BLAST search in GenBank, carried out using a 1 4-bp sequence (the Sau3A I recognition sequence. 

so plus the 10 bp specific for each tag). 
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Annex to the description 



[0087] 



SEQUENCE LISTING 



10 



15 



20 



25 



30 



35 



40 



45 



50 



<110> COMMISSARAIT A L * ENERGIE ATOMIQUE 

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE 

<120> MICROASSAY FOR SERIAL ANALYSIS OF GENE EXPRESSION AND 
APPLICATIONS THEREOF. 

<130> BLOcp263EP51 

<140> 
<141> 

<160> 27 

<170> Patent In Ver. 2.1 

<210> 1 
<211> 9 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : LINKER 

<400> 1 
gatcgtccc 



<210> 2 
<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : LINKER 1A 
<400> 2 

ttttgccagg tcactcaagt cggtcattca tgtcagcaca gggac 



<210> 3 
<211> 46 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence : LINKER IB 
<400> 3 

gatcgtccct gtgctgacat gaatgaccga cttgagtgac ctggca 



<210> 4 
<211> 45 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence : LINKER2A 
<40C> 4 

tttttgctca ggctcaaggc tcgtctaatc acagtcggaa gggac 



<210> 5 
10 <211> 46 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : LINKER 2B 
<400> 5 

gatcgtccct tccgactgtg attagacgag ccttgagcct gagcaa 



20 <210> 6 

<211> 24 
<212> DNA 

<213> Artificial Sequence 

25 <220> 

<223> Description of Artificial Sequence : PRIMER 

<400> 6 

gccaggtcac tcaagtcggt catt 

30 

<210> 7 
<211> 23 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> Description of Artificial Sequence : PRIMER 
<400> 7 

40 tgctcaggct caaggctcgt eta 



<210> 8 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 8 
gtggcagtgg 

<210> 9 

<211> 10 

<212> DNA 

<213> Mus sp. 
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<400> 9 
ttataatttg 



10 



10 



<210> 10 
<211> 10 
<212> DNA 
<213> Mus ep. 

<400> 10 
tggcagtggg 



10 



IS 



<210> 11 
<211> 10 
<212> DNA 
<213> Mus sp. 
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<400> 11 
tgactccctc 



10 



25 



<210> 12 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 12 
aagtttaaat 



10 



35 



<210> 13 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 13 
agcaagcagg 



10 



40 



45 



<210> 14 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 14 
caaaaagcta 



10 



50 
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<210> 15 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 15 
acattcctta 



10 
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<210> 16 
<211> 10 
<212> DNA 
<213> Mus sp. 
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<400> 16 
accgaccgca 



10 



10 



<210> 17 
<211> 10 
<212> DNA 
<213> Mus sp. 



15 



<400> 17 
cagaagaagt 



10 



20 



<210> 18 
<211> 10 
<212> DNA 
<213> Mus sp. 



25 



<400> 13 
aaataaagtt 



10 



30 



<210> 19 
<211> 10 
<212> DNA 
<213> Mus sp. 



<400> 19 
agaagcagtg 



10 



35 



40 



<210> 20 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 20 
tgatgccctc 



10 



45 



<210> 21 
<21i> 10 
<212> DNA 
<213> Mus sp. 



50 



<400> 21 
aggctactac 



10 



55 



<210> 22 
<211> 10 
<212> DNA 
<213> Mus sp. 
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<400> 22 
gctcattgga 



10 



25 



30 



40 



<210> 23 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 23 
gctttcagca 



<210> 24 

15 <211> 10 

<212> DNA 

<213> Mus sp. 

<400> 24 
20 gtgactgggt 



<210> 25 
<211> 10 
<212> DNA 
<213> Mus sp. 



<400> 25 
tgaccaaggc 



<210> 26 
<211> 10 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : linker 

<400> 26 
gtccctgtgc 



<210> 27 
<211> 10 
<212> DNA 
45 <213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence : linker 

SO <400> 27 

gtcccttccg 
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Claims 

1. Method of obtaining a library of tags able to define a specific state of a biological sample, characterised in that it 
comprises the following successive steps: 

(1 ) extracting in a single-step mRNA from a small amount of a biological sample using oligo(dT) 25 covalently 
bound to paramagnetic beads, 

(2) generating a double strand cDNA library, from said mRNA according to the following steps: 

10 * synthesising the 1 st strand of said cDN A by reverse transcription of said mRNA template into a 1*' com- 

plementary single-strand cDNA, using a reverse transcriptase lacking Rnase H activity, 
* synthesising the 2"* strand of said cDNA by nick translation of the mRNA, in the mRNA-cDNA hybrid form 
by an £ coli DNA polymerase, 

is (3) cleaving the obtained cDNAs using the restriction endonuclease Sau3A I as anchoring enzyme, 

(4) separating the cleaved cDNAs in two aliquots, 

(5) ligating the cDNA contained in each of said two aliquots via said Sau3A I restriction site to a lin ker consisting 
of one double-strand cDNA molecule having one of the following formulas: 



20 



GATCGTCCC-X, or GATCGTCCC-X 2 , 



wherein X, and X 2 , which comprise 30-37 nucleotides and are different, include a 20-25 bp PCR priming 
site with a Tm of 55°C-65°C, and 
2s wherein GATCGTCCC correspond to a Sau3A I restriction site joined to a BsmF I restriction site, 

(6) digesting the products obtained in step (5) with the tagging enzyme BsmF I and releasing linkers with 
anchored short piece of cDNA corresponding to a transcript-specific tag, said digestion generating BsmF I 
tags specific of the initial mRNA, 
30 (7) blunt-ending said BsmF I tags with a DNA polymerase, preferably T7 DNA polymerase or Vent polymerase 

and mixing the tags ligated with the different linkers, 

(8) ligating the tags obtained in step (7) to form ditags with a DNA ligase, 

(9) amplifying the ditags obtained in step (8) with primers comprising 20-25 bp and having a Tm of 55 -65 C, 

(10) isolating the ditags having between 20 and 28 bp from the amplification products obtained in step (9) by 
35 digesting said amplification products with the anchoring enzyme Sau3A I and separating the digested products 

on an appropriate gel electrophoresis, 

(11) ligating the ditags obtained in step (10) to form concatemers, purif lying said concatemers and separating 
the concatemers having more than 300 bp, 

(12) cloning and sequencing said concatemers and 
40 (13) analysing the different obtained tags. 

2 Method according to claim 1 , characterised in that in step (2), said synthesis of the 1* strand of said cDNA is 
performed with Moloney Murine Leukaemia Virus reverse transcriptase (M-MLV RT), and ohgo(dT) 25 as primers. 

45 3 Method according to claim 1 or to claim 2, characterised in that the linkers of step (5) are preferably hybrid DNA 
molecules formed from linkers 1 A and 1 B or from linkers 2A and 2B, having the following formulas: 

linker I A: 5' -ttttgccaggtcactcaagtcggtcattcatgtcagcacagggac-3' 
linker IB: 5' -gatcgtccctgtgctgacatgaatgaccgacttgagtgacctggca-3' 



so 



or 



55 



linker 2A: 5' -tttttgctcaggctcaaggctcgtctaatcacagtcggaagggac-3' 
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linker 2B: 5' -gatcgtcccttccgactgtgattagacgagccttgagcctgagcaa-3' . 

Method according to claims 1 to 3, characterised in that the amount of each linker in step (5) is at most of 8-10 
pmol and preferably comprised between 0.5 pmoi and 8 pmol for initial amounts of respectively 1 0-40 ng of mRNAs 
and 5 u.g of mRNAs. 

Method according to claims 1 to 4, characterised in that the primers of step (9) have preferably the following 
formulas: 



5'-GCCAGGTCACTCAAGTCGGTCATTo , 

5 '-TGCTCAGGCTCAAGGCTCGTCTA-3 ' . 

6. Method according to claims 1 to 5, characterised in that the biological sample of step (1 ) preferably, comprises S 
5. 1 0 6 cells, corresponding to at most 50 jag of total RNA or 1 u.g of poly(A) RNA. 

20 7 Method according to claims 1 to 6, characterised in that said tissue sample is from kidney, more specifically -from 
nephron segments corresponding to about 1 5,000 to 45,000 cells, corresponding to 0.1 5-0.45 u.g of total RNA. 

8. Use of a library of tags obtained according to the method of claims 1 to 7, for assessing the state of a biological 
sample. 



9. Use of the tags obtained according to claims 1 to 7 as probes. 

10. Method of determination of a gene expression profile, characterised in that it comprises : 

30 . performing steps (1 ) to (1 3) according to claim 1 and 

translating cDNA tag abundance in gene expression profile. 

11 Method according to claim 10, characterised in that the gene expression profile obtained in mouse outer medullary 
collecting duct (OMCD) and in mouse medullary thick ascending limb (MTAL) is as specified tn Table I. 

12 A kit useful for detection of gene expression profile, characterised in that the presence of a cDNA tag obtained 
' from the mRN A extracted from a biological sample, is indicative of expression of a gene having said tag sequence 

at an appropriate position, i.e. immediately adjacent to the most 3' Sau3A I site in said cDNA, the kit comprising 
further to usual buffers for cDNA synthesis, restriction enzyme digestion, ligation and amplification, 

- containers containing linker consisting of one double-strand cDNA molecule having one of the following for- 
mulas: 



GATCGTCCC-X, or GATCGTCCC-X : , 

wherein ^ and X 2 , which comprise 30-37 nucleotides and are different, include a 20-25 bp PCR priming 
site with a Tm of 55°C-65 0 C, and 

wherein GATCGTCCC correspond to a Sau3A I restriction site joined to a BsmF I restriction site, and 
- containers containing primers comprising 20-25 bp and having a Tm of 55°-65°C 
13. Kit according to claim 12, characterised in that it preferably contains: 

55 - containers containing hybrid DN A molecules formed from linkers 1 A and 1 B or from linkers 2A and 2B, having 

the following formulas: 
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linker 1A: 5' -ttttgccaggtcactcaagtcggtcattcatgtcagcacagggac- 
linker IB: 5' -gatcgtccctgtgctgacatgaatgaccgacttgagtgacctggca- 

linker2A: 5' -tttttgctcaggctcaaggctcgtctaatcacagtcggaagggac- 
linker 2B: 5' -gatcgtcccttccgactgtgattagacgagccttgagcctgagcaa 



15 and 

containers containing the following primers: 



5 ' -GCC AGGTC ACTC AAGTCGGTC ATT-3 ' 
5 ' -TGCTC AGGCTC AAGGCTCGTCTA-3 * . 
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